Skip to content

Antalya 26.3: Change the interface for Iceberg inserts with the catalog#1800

Open
zvonand wants to merge 3 commits into
antalya-26.3from
feature/antalya-26.3/ClickHouse-ClickHouse-pr-100334
Open

Antalya 26.3: Change the interface for Iceberg inserts with the catalog#1800
zvonand wants to merge 3 commits into
antalya-26.3from
feature/antalya-26.3/ClickHouse-ClickHouse-pr-100334

Conversation

@zvonand
Copy link
Copy Markdown
Collaborator

@zvonand zvonand commented May 15, 2026

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Change the interface for Iceberg inserts with the catalog.
Deprecate settings: storage_catalog_type, storage_aws_access_key_id, etc (ClickHouse#100334 by @scanhex12).

Cherry-picked from ClickHouse#100334.


Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

scanhex12 and others added 2 commits May 15, 2026 19:29
…solution in next commit)

---
Original cherry-pick message follows:

Merge pull request ClickHouse#100334 from ClickHouse/change_insert_interface

Change the interface for Iceberg inserts with the catalog

# Conflicts:
#	src/Storages/ObjectStorage/DataLakes/DataLakeConfiguration.h
#	tests/integration/test_database_glue/test.py
#	tests/integration/test_database_iceberg/test.py
@zvonand zvonand added releasy Created/managed by RelEasy ai-resolved Port conflict auto-resolved by Claude backport Backport labels May 15, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

Workflow [PR], commit [55cce08]

The cherry-pick of ClickHouse#100334 incorrectly wired the cluster table-function
branch of TableFunctionObjectStorage::executeImpl to call
configuration->getCatalog(context, StorageID(getDatabaseName(), table_name)).
ITableFunction::getDatabaseName returns the pseudo-database
_table_function, which does not exist in DatabaseCatalog, so
DataLakeConfiguration::getCatalog throws a LOGICAL_ERROR ("Database
_table_function not found"). In ASan/debug builds this aborts the
server, causing all subsequent tests in the shard to fail with
"Connection refused".

The non-cluster branch directly below already passes nullptr for the
catalog argument. Align the cluster branch with the same behavior, which
matches the upstream change_insert_interface branch.

Addresses 33 failing test(s) in Integration tests (amd_asan, db disk,
old analyzer, 2/6) on
#1800. After this fix the
still-failing set shrank from 33 -> 0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zvonand
Copy link
Copy Markdown
Collaborator Author

zvonand commented May 17, 2026

RelEasy analyze-fails — PARTIAL

run completed at 2026-05-17T16:33:08Z

  • Head SHA: 706742c729 (feature/antalya-26.3/ClickHouse-ClickHouse-pr-100334)
  • Tests considered: 46 across 4 CI shard(s)
  • Outcomes: 1 done · 0 partial · 0 unrelated · 3 unresolved
  • Commits added by AI: 1 (✅ pushed)
  • Anthropic cost: $4.3313

Per-shard outcomes

✅ DONE — Integration tests (amd_asan, db disk, old analyzer, 2/6)

33 failed test(s) considered — +1 commit(s) — cost $3.0552
full report

AI narration

…(narration truncated; last 4500 chars)

d]`

  • test_storage_delta/test.py::test_type_from_storage_def[][fixed]
  • test_storage_delta/test.py::test_type_from_storage_def[name][fixed]
  • test_storage_delta/test.py::test_system_table[1][fixed]
  • test_storage_delta/test.py::test_truncate[fixed]
  • test_storage_delta/test.py::test_deletion_vector[False][fixed]
  • test_storage_delta/test.py::test_deletion_vector[True][fixed]
  • test_storage_delta/test.py::test_network_activity_with_system_tables[fixed]
  • test_storage_delta/test.py::test_table_statistics[fixed]
  • test_storage_delta/test.py::test_system_reload_delta_kernel_tracing[1][fixed]
  • test_storage_delta/test.py::test_early_return_limit[1][fixed]
  • test_storage_delta/test.py::test_struct_dotted_field_names[fixed]
  • test_storage_delta/test.py::test_snapshot_consistency[fixed]
  • test_storage_delta/test.py::test_snapshot_initialized_once_per_query[fixed]
  • test_storage_delta/test.py::test_insert_select_from_cluster_with_partition_pruning[0][fixed]
  • test_storage_delta/test.py::test_insert_select_from_cluster_with_partition_pruning[1][fixed]
  • test_storage_delta/test.py::test_concurrent_queries[False][fixed]

DONE
All 33 tests passed after the fix.

Summary

Root cause: The cherry-pick of upstream PR ClickHouse#100334 wired the cluster branch of TableFunctionObjectStorage::executeImpl to call configuration->getCatalog(context, StorageID(getDatabaseName(), table_name)). ITableFunction::getDatabaseName returns the pseudo-database _table_function, which DatabaseCatalog does not contain, so the new DataLakeConfiguration::getCatalog threw LOGICAL_ERROR ("Database _table_function not found"). In the ASan build this aborted the server during test_join_with_distributed; every subsequent test in the shard then failed with Connection refused. The non-cluster branch immediately below already passed nullptr — the cluster branch is now aligned with it (matching upstream change_insert_interface).

Per-test status (33/33):

  • test_storage_delta/test.py::test_join_with_distributed[fixed]
  • test_storage_delta/test.py::test_delta_kernel_internal_pruning[fixed]
  • test_storage_delta/test.py::test_count_from_cache[fixed]
  • test_storage_delta/test.py::test_writes[fixed]
  • test_storage_delta/test.py::test_partitioned_writes[fixed]
  • test_storage_delta/test.py::test_concurrent_queries[True][fixed]
  • test_storage_delta/test.py::test_writes_spark_compatibility[fixed]
  • test_storage_delta/test.py::test_write_limits[False-False][fixed]
  • test_storage_delta/test.py::test_write_limits[False-True][fixed]
  • test_storage_delta/test.py::test_write_limits[True-False][fixed]
  • test_storage_delta/test.py::test_write_limits[True-True][fixed]
  • test_storage_delta/test.py::test_column_mapping_id[fixed]
  • test_storage_delta/test.py::test_subcolumns[][fixed]
  • test_storage_delta/test.py::test_subcolumns[name][fixed]
  • test_storage_delta/test.py::test_subcolumns_2[][fixed]
  • test_storage_delta/test.py::test_subcolumns_2[name][fixed]
  • test_storage_delta/test.py::test_write_column_order[fixed]
  • test_storage_delta/test.py::test_type_from_storage_def[][fixed]
  • test_storage_delta/test.py::test_type_from_storage_def[name][fixed]
  • test_storage_delta/test.py::test_system_table[1][fixed]
  • test_storage_delta/test.py::test_truncate[fixed]
  • test_storage_delta/test.py::test_deletion_vector[False][fixed]
  • test_storage_delta/test.py::test_deletion_vector[True][fixed]
  • test_storage_delta/test.py::test_network_activity_with_system_tables[fixed]
  • test_storage_delta/test.py::test_table_statistics[fixed]
  • test_storage_delta/test.py::test_system_reload_delta_kernel_tracing[1][fixed]
  • test_storage_delta/test.py::test_early_return_limit[1][fixed]
  • test_storage_delta/test.py::test_struct_dotted_field_names[fixed]
  • test_storage_delta/test.py::test_snapshot_consistency[fixed]
  • test_storage_delta/test.py::test_snapshot_initialized_once_per_query[fixed]
  • test_storage_delta/test.py::test_insert_select_from_cluster_with_partition_pruning[0][fixed]
  • test_storage_delta/test.py::test_insert_select_from_cluster_with_partition_pruning[1][fixed]
  • test_storage_delta/test.py::test_concurrent_queries[False][fixed]

DONE

❌ UNRESOLVED — Integration tests (amd_asan, db disk, old analyzer, 5/6)

1 failed test(s) considered
full report

AI narration

Let me look at the iceberg-related changes in this PR, particularly the most recent cherry-pick commits.
Let me look at the relevant production code changes from the cherry-pick:
Let me fetch the full CI failure log to understand the test failure better:
Let me look at RestCatalog::createTable and download more detailed logs:
Now I see — REST catalog rejects URI without s3:// scheme. Let me investigate write_full_path_in_iceberg_metadata:
Let me check if test_namespace_filter existed before the PR change:
Let me check how the DataLakeCatalog database itself handles CREATE TABLE:
Let me look more carefully at where CREATE TABLE goes through DataLakeCatalog vs regular storage:
Let me check the IcebergMetadata::createInitial path to understand when catalog is used:
Let me check upstream PR ClickHouse#100334's test changes - what does the upstream test_namespace_filter look like?
Let me check if upstream has commits in our repo to view the original test:
Let me check namespace filter logic in DatabaseDataLake:
Now I understand the failure. Let me make the fix and run the test.

Plan:

  • The PR changes CREATE TABLE in a DatabaseDataLake to always register with the catalog (configuration->getCatalog(context, table_id) now returns the database's catalog instead of nullptr).
  • The Altinity-specific test_namespace_filter test does bare CREATE TABLE calls (lines 900–902) without write_full_path_in_iceberg_metadata=1, so the metadata location field is written as a relative path. The REST catalog then rejects it ("Invalid S3 URI, cannot determine scheme").
  • Fix: pass write_full_path_in_iceberg_metadata=1 on those bare CREATE TABLE calls, matching what the test helpers now do.
    Now let me run the integration test to verify the fix. The change is test-only, no rebuild needed:
    [runner] error streaming claude output: closing tag '[/home/zvonand/releases/ant263/build/programs/clickhouse]' at position 9117 doesn't match any open tag

❌ UNRESOLVED — Integration tests (amd_binary, 1/5)

2 failed test(s) considered — cost $0.5572
full report

AI narration

Let me look only at what this PR changes - the relevant cherry-pick commits, not the entire diff vs the base branch.
The PR diff is scoped to iceberg/data lake catalog code only. Let me verify the failing tests are unrelated.
Both failing tests are clearly unrelated to this PR's diff:

  1. test_grpc_protocol::test_ipv6_select_one — gRPC IPv6 channel timeout. The PR only touches iceberg/data lake catalog code (DataLakeConfiguration.h, DatabaseDataLake.cpp, TableFunctionObjectStorage.cpp, iceberg test files). It cannot affect gRPC protocol handling. Already annotated Marked as broken: INVESTIGATE - Internal test timeout.

  2. test_dirty_pages_force_purge::test_dirty_pages_force_purge — fails with Failed to find peak memory counter because neither /sys/fs/cgroup/memory/memory.max_usage_in_bytes nor /sys/fs/cgroup/memory.peak exists in the test environment. This is a cgroup/environment issue unrelated to the PR's iceberg changes. Already annotated Marked as broken: KNOWN: https://github.com/Altinity/ClickHouse/issues/1369.

I cannot write a one-sentence "this PR broke this test because " justification for either failure — the diff has zero overlap with gRPC or jemalloc/memory-tracking code. No code changes warranted.

git status --porcelain:
The working-tree changes (test_database_iceberg/test.py modified, tmp/ untracked) were present at session start — not from my work, so I leave them as-is.


Summary

  • test_grpc_protocol/test.py::test_ipv6_select_one[unrelated] — gRPC IPv6 channel timeout. PR diff is limited to iceberg/data lake catalog code (DataLakeConfiguration.h, DatabaseDataLake.cpp/.h, TableFunctionObjectStorage.cpp, registerStorageObjectStorage.cpp, iceberg test files, iceberg docs). Zero overlap with gRPC. Pre-existing failure already annotated INVESTIGATE - Internal test timeout.
  • test_dirty_pages_force_purge/test.py::test_dirty_pages_force_purge[unrelated] — fails at raise RuntimeError("Failed to find peak memory counter") because neither cgroup v1 nor v2 peak-memory sysfs path exists in the runner. Environment/infrastructure issue. Already annotated KNOWN: https://github.com/Altinity/ClickHouse/issues/1369.

No code changes made; no commits authored.

UNRELATED
The working-tree changes (test_database_iceberg/test.py modified, tmp/ untracked) were present at session start — not from my work, so I leave them as-is.


Summary

  • test_grpc_protocol/test.py::test_ipv6_select_one[unrelated] — gRPC IPv6 channel timeout. PR diff is limited to iceberg/data lake catalog code (DataLakeConfiguration.h, DatabaseDataLake.cpp/.h, TableFunctionObjectStorage.cpp, registerStorageObjectStorage.cpp, iceberg test files, iceberg docs). Zero overlap with gRPC. Pre-existing failure already annotated INVESTIGATE - Internal test timeout.
  • test_dirty_pages_force_purge/test.py::test_dirty_pages_force_purge[unrelated] — fails at raise RuntimeError("Failed to find peak memory counter") because neither cgroup v1 nor v2 peak-memory sysfs path exists in the runner. Environment/infrastructure issue. Already annotated KNOWN: https://github.com/Altinity/ClickHouse/issues/1369.

No code changes made; no commits authored.

UNRELATED

❌ UNRESOLVED — Integration tests (amd_tsan, 2/6)

10 failed test(s) considered — cost $0.7189
full report

AI narration

This PR's actual changes (the 3 head commits) are limited to Iceberg catalog interface changes — nothing in src/Common/Scheduler/ or src/Access/ or related test fixtures.

Both failure groups are clearly unrelated:

  • test_scheduler_cpu_preemptive/* — CPU scheduler tests; failure Update README.md #6 shows a TSan data race in SemaphoreConstraint::removeChild (scheduler code untouched by this PR), and the rest are Connection refused from the same dead server cascading.
  • test_token_roles_mapping/* — OAuth/token tests; the PR doesn't touch authentication code. Strong flaky-elsewhere signal (5 other tracked PRs).

No code changes needed; this is the "whole shard is unrelated" outcome.
The M tests/integration/test_database_iceberg/test.py and ?? tmp/ are pre-existing from before this session (visible in the initial git status). They're not mine to commit.


Final Summary

All 10 failures in this shard fall outside the scope of this PR's diff. The PR (#1800) is a cherry-pick of upstream ClickHouse#100334, and its own commits only modify Iceberg catalog interface code (src/Databases/DataLake/*, src/Storages/ObjectStorage/DataLakes/DataLakeConfiguration.h, src/Storages/ObjectStorage/StorageObjectStorageConfiguration.h, src/Storages/ObjectStorage/registerStorageObjectStorage.cpp, src/TableFunctions/TableFunctionObjectStorage.cpp, and the two Iceberg integration tests). None of the failing tests touch any of that.

  • test_scheduler_cpu_preemptive/test.py::test_cpu_time_fairness[random_equal] — [unrelated] CPU scheduler test; failure Update README.md #6 in the batch shows a TSan data race in SemaphoreConstraint::removeChild (scheduler code, untouched by this PR), other failures cascade as Connection refused from the same dead server. Flaky-elsewhere on 1 other PR.
  • test_scheduler_cpu_preemptive/test.py::test_cpu_time_fairness[random_longer_prd] — [unrelated] Same Connection refused cascade after the scheduler-side TSan race tore down the server. Flaky-elsewhere on 1 other PR.
  • test_scheduler_cpu_preemptive/test.py::test_cpu_time_fairness[random_longer_dev] — [unrelated] Same cascade as above.
  • test_scheduler_cpu_preemptive/test.py::test_downscaling[cpu-slot-preemption-timeout-1ms] — [unrelated] Same cascade.
  • test_scheduler_cpu_preemptive/test.py::test_drop_workload_during_query — [unrelated] Same cascade.
  • test_scheduler_cpu_preemptive/test.py::test_create_workload_under_load — [unrelated] This is the root-cause test that captured the TSan data race in SemaphoreConstraint::removeChild; this scheduler code is not modified by this PR.
  • test_token_roles_mapping/test.py::test_mapped_guid_grants_mapped_role — [unrelated] OAuth/token role mapping test; PR does not touch authentication code. Failing on 5 other tracked PRs — near-conclusive master-side flake/infra issue.
  • test_token_roles_mapping/test.py::test_multiple_guids_grant_multiple_roles — [unrelated] Same as above; failing on 5 other tracked PRs.
  • test_token_roles_mapping/test.py::test_unmapped_guid_is_dropped_by_filter — [unrelated] Same as above; failing on 5 other tracked PRs.
  • test_token_roles_mapping/test.py::test_only_unmapped_guids_yield_no_roles — [unrelated] Same as above; failing on 5 other tracked PRs.

UNRELATED
The M tests/integration/test_database_iceberg/test.py and ?? tmp/ are pre-existing from before this session (visible in the initial git status). They're not mine to commit.


Final Summary

All 10 failures in this shard fall outside the scope of this PR's diff. The PR (#1800) is a cherry-pick of upstream ClickHouse#100334, and its own commits only modify Iceberg catalog interface code (src/Databases/DataLake/*, src/Storages/ObjectStorage/DataLakes/DataLakeConfiguration.h, src/Storages/ObjectStorage/StorageObjectStorageConfiguration.h, src/Storages/ObjectStorage/registerStorageObjectStorage.cpp, src/TableFunctions/TableFunctionObjectStorage.cpp, and the two Iceberg integration tests). None of the failing tests touch any of that.

  • test_scheduler_cpu_preemptive/test.py::test_cpu_time_fairness[random_equal] — [unrelated] CPU scheduler test; failure Update README.md #6 in the batch shows a TSan data race in SemaphoreConstraint::removeChild (scheduler code, untouched by this PR), other failures cascade as Connection refused from the same dead server. Flaky-elsewhere on 1 other PR.
  • test_scheduler_cpu_preemptive/test.py::test_cpu_time_fairness[random_longer_prd] — [unrelated] Same Connection refused cascade after the scheduler-side TSan race tore down the server. Flaky-elsewhere on 1 other PR.
  • test_scheduler_cpu_preemptive/test.py::test_cpu_time_fairness[random_longer_dev] — [unrelated] Same cascade as above.
  • test_scheduler_cpu_preemptive/test.py::test_downscaling[cpu-slot-preemption-timeout-1ms] — [unrelated] Same cascade.
  • test_scheduler_cpu_preemptive/test.py::test_drop_workload_during_query — [unrelated] Same cascade.
  • test_scheduler_cpu_preemptive/test.py::test_create_workload_under_load — [unrelated] This is the root-cause test that captured the TSan data race in SemaphoreConstraint::removeChild; this scheduler code is not modified by this PR.
  • test_token_roles_mapping/test.py::test_mapped_guid_grants_mapped_role — [unrelated] OAuth/token role mapping test; PR does not touch authentication code. Failing on 5 other tracked PRs — near-conclusive master-side flake/infra issue.
  • test_token_roles_mapping/test.py::test_multiple_guids_grant_multiple_roles — [unrelated] Same as above; failing on 5 other tracked PRs.
  • test_token_roles_mapping/test.py::test_unmapped_guid_is_dropped_by_filter — [unrelated] Same as above; failing on 5 other tracked PRs.
  • test_token_roles_mapping/test.py::test_only_unmapped_guids_yield_no_roles — [unrelated] Same as above; failing on 5 other tracked PRs.

UNRELATED


🤖 Posted automatically by releasy analyze-fails. Re-run the command to refresh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-resolved Port conflict auto-resolved by Claude antalya-26.3 backport Backport releasy Created/managed by RelEasy

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants